Asynchronous Distributed Data Parallelism for Machine Learning

نویسندگان

  • Zheng Yan
  • Yunfeng Shao
چکیده

Distributed machine learning has gained much attention due to recent proliferation of large scale learning problems. Designing a high-performance framework poses many challenges and opportunities for system engineers. This paper presents a novel architecture for solving distributed learning problems in framework of data parallelism where model replicas are trained over multiple worker nodes. Worker nodes are grouped into worker groups which enable model replicas to be asynchronously aggregated via peer-to-peer communication. Merits of this framework include elastic scalability, fault tolerance, and efficient communication.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Machine Learning: Foundations, Trends, and Practices

In recent years, artificial intelligence has achieved great success in many important applications. Both novel machine learning algorithms (e.g., deep neural networks), and their distributed implementations play very critical roles in the success. In this tutorial, we will first review popular machine learning algorithms and the optimization techniques they use. Second, we will introduce widely...

متن کامل

Asynchronous Decentralized Parallel Stochastic Gradient Descent

Recent work shows that decentralized parallel stochastic gradient decent (D-PSGD) can outperform its centralized counterpart both theoretically and practically. While asynchronous parallelism is a powerful technology to improve the efficiency of parallelism in distributed machine learning platforms and has been widely used in many popular machine learning softwares and solvers based on centrali...

متن کامل

Thesis Proposal Parallel Learning and Inference in Probabilistic Graphical Models

Probabilistic graphical models are one of the most influential and widely used techniques in machine learning. Powered by exponential gains in processor technology, graphical models have been successfully applied to a wide range of increasingly large and complex real-world problems. However, recent developments in computer architecture, large-scale computing, and data-storage have shifted the f...

متن کامل

ASAP: Asynchronous Approximate Data-Parallel Computation

Emerging workloads, such as graph processing and machine learning are approximate because of the scale of data involved and the stochastic nature of the underlying algorithms. These algorithms are often distributed over multiple machines using bulk-synchronous processing (BSP) or other synchronous processing paradigms such as map-reduce. However, data parallel processing primitives such as repe...

متن کامل

Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud

While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015